March 1st, ’23

Overview

1 | background

2 | questions

3 | approaches

4 | methods

5 | results

6 | future

7 | conclusions

Cercocarpus

Team

Jane Ogilvie (ecological fieldwork, design)

Emily J. Woodworth (pollen morphology, and microscopy)

Sophie Taddeo (geo-spatial, statistics)

Paul CaraDonna (little bees/big picture(s))

Jeremie Fant (all things molecular and tied together)

i.e. an in-house production.

the world is big - 1.1

Primary Roads

… really really big - 1.2

  • 5 sampling seasons (May - October)
  • 3 person crews
  • 2 partial support personnel
  • 281 plots
  • area of inference: ~ 900,000 acres
  • 0.363% of Bureau of Land Management administered land

FQI calculated from AIM

funding opportunties - 1.3

how do we sample the planet? - 1.4

plant species in ecology - 1.6

  • mis-identification is very common
  • mis-identification can lead to nebulous understandings
  • mis-identification can lead to mis-management

Cirsium scariosum

insects species in ecology - 1.7

  • Macro Invertebrates
    • stream ecology bio indicators
      • mayflies, caddisflies, stoneflies
  • Coleoptera
    • soil contamination by metal
  • from bio-indicators to foci?

Macroinvertebrates.org

from organisms to interactions - 1.8

  • Bullet 2
  • Bullet 3

Solidago spathulata & Megachile wheeleri, by A. Litz

metabarcoding - 1.10

  • Barcoding
    • molecular identification of tissue from a single organism
  • Metabarcoding
    • molecular identification of organisms present in a mixed substrate

Five Astragali

barcodes - 1.11

  • Kingdom: Animal,
    • COI (Cytochrome c OxIdase)
    • holding it’s own in Fungi
  • Kingdoms: Fungi + Plant
    • ITS (Interal Transcribed Spacer)
    • holding it’s own in Fungi
  • Kingdom: Plant
    • ITS, rbcL, matK, trnH-psbA
    • not holding much of anything

new barcodes for plants? - 1.12

  • genomics
    • low cost
    • high coverage
    • PCR free?
  • reference library ?
    • old barcode library in development for nearly 20 years
    • Kew PAFTOL
  • angiosperms 353

2 - questions

scale - from plots to continents?

  • many questions will be approached using two perspectives
    • bottom up i.e. plot based data collected by Jane
    • top down i.e. computer based data generated by me
  • fine scale data serving to as ground truth to the computer generated models

can we predict what is flowering in time & space? - 2.1

  • which species are present in an area?
  • when are these species flowering in an area?
  • diverse clades provide challenges for identification
  • species often diverged in ecological traits

Rhododendron sp. Hengduan Mtns., by Qin Li

do a353 work as barcodes? - 2.2

  • ‘universal’ markers for phylogenomics
  • usable in all flowering plant clades
  • first comprehensive genus level phylogeny of flowering plants
  • shoot the moon; meta genomics first

are a353 semi-quantitative? - 2.3

Do the number of sequence reads reflect the amount of biological material in a sample?

3 - approaches

predict what is flowering; Time & Space - 3.1

  • no longer any funding for floristics; few Floras maintained, fewer written
  • essentially no funding remains for alpha taxonomy
  • little to no funding natural history
  • how do we monitor ecological shifts under climate change?
    • geographic ranges
    • flowering time
  • back to the sheets!

FPNW 2nd

custom sequence databases; a353 as barcodes? - 3.2

  • reduce number of species present in database
    • reduce computational requirements
    • increase likelihood of relevant matches across loci
    • reduce false positives for semi-quantitative inference

queen bee pollen loads: a353 as barcodes? - 3.3

  • DNA extracted from corbiculae loads
    • a ‘pollen basket’ for holding grains collected grains
  • variable in size, but generally many tens of thousands of grains

USFWS

identify pollen grains; a353 semi-quantitative? - 3.4

4 - methods

  • field work
  • spatial
  • temporal
  • morphologic
  • laboratory
  • bioinformatic
  • post-classification

study system & field work - 4.1

pollen morphological identification 4.2

workflow

pollen reference library 4.2.1

  • ca. 110 species
  • 60 novels species added
  • ca. 1/3 of species with duplicate preparations
  • shared
  • many more species to add to key!! (60 +, mostly un- sampled families)

pollen corbiculae loads 4.2.2

  • aliquot from same sample used for molecular
  • stained by fuchsin jelly with stirring
  • transects
  • rarefaction curves
    • richness
    • abundance

Corbiculae Sample

molecular barcoding 4.3

  • Angiosperms 353…

spatial analysis 4.3.1

  • 2-stage approach
    • 1st: distance search of records from museums & plot based data (e.g. Forest Servce)
    • 2nd: species distribution modelling

plant species, distribution modelling 4.3.1.1.

  • develop a candidate species list for barcoding
  • download all herbarium records from a distance exceeding the study area
  • compare to known species at field site
  • logistic regression
  • bootstrapped samples of records

species distribution modelling - 4.3.1.2

sdm evaluations - 4.3.1.2

  • in pipeline, True skill statistics
    • works well over wide range of occurrence records

temporal modelling - 4.3.2

  • reduce herbarium records to study domain
  • thin records to analogous ecoregions
  • trim start/end records
  • identify major phenological cues, subset records to similar areas

temporal modelling subset - 4.3.2.1

SPATIAL SUBSET PICTURE

temporal modelling distributions - 4.3.2.2

barcode references library - 4.4

genomics work - 4.4.1

  • Plant Reference Library
    • herbarium & silica dried
    • CTAB, some DNEasy
  • Pollen Extraction
    • ‘novel’ CTAB / SDS extraction
  • Both
    • clean up Cytiza, size selection SPRI
    • enzymatic fragmentation

plant genomic reference dna - 4.4.2

  • 38 species to sequencing
  • 13 species duplicate
  • 24 silica gel dried, 14 herbarium leaf tissue (RM, ID, IDS)

tissue from Rocky Mountain Herbarium

pollen genomics dna - 4.4.3

  • 54 Initial samples for extraction
  • 44 samples underwent all steps and were analyses

hyb-seq

barcoding informatics - 4.4.4

  • trimmomatic, remove tags, select sequences > 31 bp in length
  • Kraken - qualitative identification
  • Bracken - quantitative identification
  • BLAST followup

Cirsium scariosum

metabarcoding - 4.5

sequence database generation - 4.5.1

  • Kew Tree of Life ~ ### taxa
  • US ~ xx TAXA

sequence assignment - 4.5.2

  • initial BLAST query
  • removal of proxy species from DB, and insertion of local species
  • filter matches based on flowering phenology
  • one big function in an order of operations (‘algo’)

Reclassification table

semi-quantitative evidence 4.5.3

stuff

5

results

field work

  • 723 floral visitation observations (!)
  • 36 unique plant species involved
  • 64 corbiculae loads from Queens

sdm candidate species ????

  • downloaded some 112k records
  • mostly trees from forestry surveys
  • bootstrap re-sampled to reduce effects of collection ‘hotspots’
  • non-present taxa begin nearly immediately…
  • real occurrences taper off quickly

database

sdm evaluations - computational - 5.2

Logistic regression assessing accuracy of SDMs; witheld data
Metric Value Metric Value
Accuracy (Training) 83.75 F-Score 0.84
Accuracy (Test) 84.00 AUC 0.92
Recall 81.03 Concordance 0.92
True Neg. Rate 86.97 Discordance 0.08
Precision 88.04 Tied 0.00

sdm evaluations - 5.3.1

ml lm
ensembles 493 473
true + 362 286
true - 33 55
false + 64 41
false - 34 93

sdm evaluations - 5.3.2

  • We were interested in comparison to the Valleys.
  • Plot Level, 117 species total (109 eligible for modelling…)
    • ML: 105 (89.7% (96.3%))
    • LM: 102 (87.2% (93.5%))
  • Able to detect virtually all species recorded on plot

coarse phenological modelling - 5.4.1

  • strong agreement between first and peak flower periods with historic data
  • good agreement between last flower date
  • no agreement with duration! - species do not ‘line up’

flower dates

coarse phenological modelling - 5.4.2

  • similar results with weekly data across all field sites combined
  • tau values lower than over longer term data



flower dates

metabarcoding - 5.5

sequence database generation - 5.6

  • found existing data for 130 species on NCBI - SRA
  • novel sequence data for 25 species, varying number of loci
  • whole ‘ring’ to be completed within the year

Species in Sequence DB

sequence assignment - 5.7 - I

  • trimmomatic (discard short reads)
  • Kraken (many false positives)
  • Bracken (many many false positives)
  • Blast (fewest false positives)

Three Initial Networks

sequence assignment - 5.7 - II

Post classification of Sequences via Taxonomy and Ecology, top 15 most abundant reads
Condition No. Class. Prcnt. Class. Total Seqs Rank
A 143 21.0 32.0 Species
B 205 30.1 10.5 Species
C 5 0.7 0.4 Genus
G 29 4.3 7.8 Species
H 280 41.2 47.9 Genus
None met 18 2.6 1.4 Multiple

sequence assingment - 5.7 - III

  • Naive BLAST from custom databases 26% accuracy
  • post-classified BLAST using temporal filters to create genera monogeneric in space and time 44%
  • BLAST, creates many false positives

sequence assingment - 5.7 - IV

  • conceptually similar to the automated process
  • utilized high resolution occurrence and phenology data
  • utilized morphological and molecular data
  • no linear operation or rule of precedence
  • classified all sequences to species

semi-quantitative evidence - 5.8

  • info
  • some relationship exists
  • requires further work by someone else

Counted grains

final floral feeding structure - 5.9

less stuff

Discussion - 6

stuff

conservation implications - 6.1

  • a hidden plain text paragraph in the discussion.
  • collaboration with Ken Holsinger, Jedd Sondergard @ BLM Montrose

conservation implications - 6.1 - II

  • historic vegetation treatment removals
    • Delphinium, Astragalus & Oxytropis
  • altered fire cycle
    • Delphinium, Mertensia
  • stream channelization / wetland removal
    • Mertensia
  • seed species, where missing
  • allow return of historic fire cycle,
  • reintroduce beavers

7

future

metabarcoding; computational approaches - 7.2

  • qualitative:
    • search for variable loci
    • flanking regions and pop gen

  • quantitative:
    • read re-assignment based on phylogentic distance
    • read re-assignment in bayesian framework

metabarcoding; new data sets? - 7.1

  • artificial mixtures
    • leaf tissue (counted cells)
    • pollen loads (counted grains)
  • Gunnison Sage-Grouse scat
    • BLM habitat assessment data
    • opportunistic collections

bombus; trends in perennial bunchgrasses - 7.3

8

conclusions

promising

acknowledgements

  • Dani Yashinowitz B.S. SUNY (Yellowstone National Park, botanist & crew lead, Whitebark Pine Surveys (!!!))
  • Hannah Lovell B.S. DU (Telluride Mountain Resort, and in search of work!!)

Dani Selfie

acknowledgments

Employment: Yingying Xie (NU), Josh Scholl (NU), Sam Isham (UM), Kelly McMillen (UM), Kay Hajek (UM), Linda Vance (UM), Cassandra Owen (SCC), Ken Holsinger (BLM)

Project: Nyree Zerega, Pat Herendeen, Hilary Noble, Zoe Diaz-Martinez, Angela McDonnell, Elena Loke, Ian Breckheimer, Ben Legler, Ernie Nelson, Charles (Rick) Williams, D. Knoke, L. Brummer, J. Boyd, C. Davidson, I. Gilman, M. Kirkpatrick, S. McCauley, J. Smith, K. Taylor, & C. Williams. David Giblin, Mare Nazaire, Sarah Burnett, Lauren Price, T.C.H. Cole, Eliot Gardner.